Search CORE

131 research outputs found

Mixed-precision cholesky QR factorization and its case studies on multicore CPU with multiple GPUs

Author: Dongarra Jack
Tomov Stanimire
Yamazaki Ichitaro
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 12/05/2015
Field of study

The University of Manchester - Institutional Repository

Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations

Author: Dongarra Jack
Tomov Stanimire
Yamazaki Ichitaro
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Low-rank matrices arise in many scientific and engineering computations. Both computational and storage costs of manipulating such matrices may be reduced by taking advantages of their low-rank properties. To compute a low-rank approximation of a dense matrix, in this paper, we study the performance of QR factorization with column pivoting or with restricted pivoting on multicore CPUs with a GPU. We first propose several techniques to reduce the postprocessing time, which is required for restricted pivoting, on a modern CPU. We then examine the potential of using a GPU to accelerate the factorization process with both column and restricted pivoting. Our performance results on two eight-core Intel Sandy Bridge CPUs with one NVIDIA Kepler GPU demonstrate that using the GPU, the factorization time can be reduced by a factor of more than two. In addition, to study the performance of our implementations in practice, we integrate them into a recently developed software StruMF which algebraically exploits such low-rank structures for solving a general sparse linear system of equations. Our performance results for solving Poisson's equations demonstrate that the proposed techniques can significantly reduce the preconditioner construction time of StruMF on the CPUs, and the construction time can be further reduced by 10%–50% using the GPU

Directory of Open Access Journals

The University of Manchester - Institutional Repository

Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems

Author: Dongarra Jack
Iqbal Zafar
Nooshabadi Saeid
Tomov Stanimire
Yamazaki Ichitaro
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2021
Field of study

Convex optimization solvers are widely used in the embedded systems that require sophisticated optimization algorithms including model predictive control (MPC). In this paper, we aim to reduce the online solve time of such convex optimization solvers so as to reduce the total runtime of the algorithm and make it suitable for real-time convex optimization.We exploit the property of the Karush–Kuhn–Tucker (KKT) matrix involved in the solution of the problem that only some parts of the matrix change during the solution iterations of the algorithm. Our results show that the proposed method can effectively reduce the runtime of the solvers

Michigan Technological University

Directory of Open Access Journals

Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs

Author: Dongarra Jack
Kurzak Jakub
Luszczek Piotr
Mary Théo
Tomov Stanimire
Yamazaki Ichitaro
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/11/2015
Field of study

International audienceA low-rank approximation of a dense matrix plays an important role in many applications. To compute such an approximation , a common approach uses the QR factorization with column pivoting (QRCP). Though the reliability and efficiency of QRCP have been demonstrated, this determin-istic approach requires costly communication at each step of the factorization. Since such communication is becoming increasingly expensive on modern computers, an alternative approach based on random sampling, which can be implemented using communication-optimal kernels, is becoming attractive. To study its potential, in this paper, we compare the performance of random sampling with that of QRCP on an NVIDIA Kepler GPU. Our performance results demonstrate that random sampling can be up to 12.8× faster than the deterministic approach for computing the approximation of the same accuracy. We also present the parallel scaling of the random sampling over multiple GPUs on a single compute node, showing a speedup of 3.8× over three Kepler GPUs. These results demonstrate the potential of the random sampling as an excellent computational tool for many applications, and its potential is likely to grow on the emerging computers with the increasing communication costs

Crossref

Accelerating linear system solutions using randomization technique

Author: Baboulin Marc
Dongarra Jack
Herrmann Julien
Tomov Stanimire
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2013
Field of study

International audienceWe illustrate how linear algebra calculations can be enhanced by statistical techniques in the case of a square linear system Ax = b. We study a random transformation of A that enables us to avoid pivoting and then to reduce the amount of communication. Numerical experiments show that this randomization can be performed at a very affordable computational price while providing us with a satisfying accuracy when compared to partial pivoting. This random transformation called Partial Random Butterfly Transformation (PRBT) is optimized in terms of data storage and flops count. We propose a solver where PRBT and the LU factorization with no pivoting take advantage of the current hybrid multicore/GPU machines and we compare its Gflop/s performance with a solver implemented in a current parallel library

HAL-ENS-LYON

HAL-CentraleSupelec

HAL - Lille 3

INRIA a CCSD electronic archive server

Hal-Diderot